perm filename COG3[AM,DBL] blob sn#472391 filedate 1979-09-13 generic text, type T, neo UTF8
\ASEC{Cognitive Economy in ANY Environment}

{\bf Summary:\  \sl 
A coherent theory of
cognitive economy must characterize the sources of power of its
methods, and describe ways in which they specialize for environments
having various properties.
While the body of this paper concentrated upon cognitive economy in {\it fluid}
environments, we hope to see such a comprehensive theory develop
in the future. For that reason, we shall present (briefly, here in an
appendix)
a few techniques
for enhancing cognitive economy in {\it arbitrary} environments.  
We have already described
in great detail (Sec. 2.2) the
{\it caching} of intermediate results (intelligent redundancy)
and will now treat two additional methods: expectation-simplified
processing (intelligent focus of attention) and multiple levels of
abstraction (intelligent knowledge structuring.)
}

	\ASSEC{Expectation-Simplified Processing}

{\bf Summary:\  \sl 
For efficiency's sake, an intelligent system should be willing and
able to add new facts, but should be {\it eager} to add surprising new facts.
Surprises can only be noticed by contrast with {\it expectations}, so an intelligent
system should maintain a context of expectations and filter incoming observations
against that.  Furthermore, expectations and surprises can aid an intelligent
system in comparing its model and processing of the domain to the real world.
Through such monitoring, discrepencies may be found and diagnosed, leading to
changes in the model making it more consistent with observed behavior.  Our
discussion here centers on the importance of using such expectations
to focus and filter intelligent processing.}

\yskip

The world bombards our senses with data, much more than we can process
in detail in real time; yet we can't live in a hundred times real time.
We survive by ignoring most of that data, or, more precisely, by knowing
(almost  immediately deciding) 
what can be ignored.  We maintain a set of expectations, against
which the incoming torrent is matched. Almost all of it will match those
expectations, and we then need merely process the unexpected inputs, the
surprising observations.  We reserve our computing for those opportunities
which promise us genuine new facts, rather than reconfirmation of known ones.

Such concentration on the abnormal is a natural side-effect of being guided
by expectations --- albeit usually stereotyped ones --- about situations,
events, and people.
One is able to zip through most of what is encountered, match it immediately
to the active schema, and move on. The only processing which is slow and
tricky is that dealing with the remaining anomalies, exceptions, puzzles.

The more general position is that each slot indicates not merely a default,
but also an indication of how important it is to confirm that value.
Some  slots  will  have  meta-comments  like  ``I'm  worth  computing   or
verifying'', or just  the opposite.  For example, in our  ``Room'' 
frame,  we have  a
default Ceiling, but in everyday life
we know to not squander time checking that each  room
we're in has a  ceiling.  Similarly for emergency  exit signs: 
they usually exist, and we rarely feel it's cost-effective to confirm that.
We have  in
both sittuations a very fast  algorithm we can apply if  ever we are in  a
situation in which we need that information.  For  ceilings, it's
$\lambda\ (\ )\ (look upwards)$; for  exit
signs, it's ``look for red on white lettering.''\foo{
While this may seem to be a much more complex program to execute, recall that
if we truly need to run it --- say in a sudden power failure ---
we expect those lit signs to be one of the few things we could still see.}

The policy for whether, when, and how to check a default slot value can be
arbitrarily complex. The program may be designed to
reason at great length about how much time to expend re-confirming the expected.
It may reason about the task of reasoning about how much time to spend,
and so on.

\yskip

\noindent Several 
increases in efficiency are realizable through expectation-filtering:

(a). Perceptual set: Expectations allow you to see what you expect with less effort
For emergency exit signs, when we need to find them, we know to look for horizontal
and vertical red lines; this is 99$\%$ effective. When we look at our red LED watch
and have a perceptual set to read the time, we of course would not mistake the
image for that of the EXIT.

(b). Surprise:  If we observe Y when we expected X, our attention is captured.
In one way, sensitivity to data inconsistent with
expectations has been heightened.  We saw in (a), above, that reliance on
frames will dull our chance of noticing a surprising datum
(since we will tend to interpret any data as consistent
with the current frame).
This point (b)
says that {\it if} we do notice it, then it will have a significant impact.
Carefully buiding up an expectation and then abrupting shattering it with a
jarring surprise is the archetypical template for a good joke. 
As Arthur
Koestler observes, the jarring of conflicting matrices of thought is what
powers art ({\sl ah!}), science ({\sl aha!}) and humor ({\sl haha!}).

(c). Defer: By relying upon expectations, upon defaults, we can usually defer
processing to find the true value in the current case forever.  At least, when
we {\it do} finally have to make a particular perception, it may at that time
be easier to make. When the ceiling is falling, it's easier to see; when all the
lights go out except for the EXIT signs, it's easier to see them.
Generalizing, we may state that it's worth a little computing to reason
out whether (and maybe even how long) a slot-filling
can be ignored.

(d). Predict and Prepare:  This includes the whole continuum we discussed in
the Caching section: bind a variable, store the triple (F,x,F(x)), store enough
information so that F(x) could be recomputed very quickly, store information
which would make F'(x') more easily computable (for x' similar to x, and F'
similar to F), store away a partial model of the world with good and bad features
tagged.  Notice that moving along this continuum, programs grow
more flexible, more descriptive (less scalar), more dynamic.
In all cases, the
motivation for doing this storage is to (i) speed up the recognition of
(the need to do) similar computations in the future, and (ii) speed up
or even eliminate those computations when you decide they {\it are} relevant
and you want the values they would return.  These are the two uses of
scripts, but also of cached constants (e.g., caching the value returned
by OWNED-BY, a function from CARS to PEOPLE). A better example is {\it user models}:
they range from a single bit (E.g., a  NOVICE/PRO toggle), 
to a few numbers (e.g., in PARRY),
to a dynamic concept-modelling scheme (e.g., in Eurisko).

In the interests of cognitive economy, we conclude that programs should
build up expectations, monitor them,
and change their models as needed.  A natural mechanism for doing this is
the use of pattern directed knowledge modules (PDMs), which can react to
changing situations, to sudden disconfirmations of predictions.
These can be used as triggers or demons; e.g., if
you hear someone shout ``Fire!" in a theater, that takes precedence over
Jane Fonda.
The second use of PDMs arises when an expectation is not met: the program then
also  has some chance of  modifying the culprit
rule(s) that made that false prediction (and/or
adding new rule(s) which would make the right one.)

%[[Mention TERIESIAS??]]	
%
%[[ignore:	This ties learning in very closely to scientific theory
%			formation (at least the theories of knowledge which
%			are activist, revolutionary conventionalist activist,
%			methodological falsificationist, sophisticated.
%			[those are the nodes descending in a genl/spec hier
%			of theories of knowledge]
%
%]]

	\ASSEC{Levels of Abstraction}

{\bf Summary:\  \sl 
An intelligent system's domain model is typically an approximation of reality.
It is often the case that knowledge exists at several levels of detail and
abstraction.  The amount of detail required in intelligent processing depends
on the problem-solving context.  By appropriately structuring knowledge, a
problem-solver can take advantage of abstraction by reasoning at the level
appropriate  to the given situation.  We argue here that abstraction
is crucial for intelligent systems which must process    over large
knowledge bases or within widely-varying time constraints.}

\yskip

%[[mention McCarthy's common-sense reasoning example: a spilling coffee cup;
%we don't try to integrate the hydrodynamic differntial equations]]


In many real-life problem solving situations, the ``goal" is not a single,
precise answer. Rather, there is a cost-benefit tradeoff between the
accuracy of the solution and the amount of resources consumed in producing
it.  
Computer programs which are designed only to return exact answers
lie at one extreme end of this curve, and rarely near the optimal point (of
diminishing returns).

This presupposes a kind of ``continuity'' in the
environment, as we discussed in Appendix 2.1. Namely, truncated search is
assumed to be better than random selection of a legal move. 
Almost paradoxically, if the environment
is mercurial, then the optimal point on the tradeoff will cause our program to
be very {\it expressive}. 

Cognitive systems which must cope with the world have an omnipresent need to
approximate reality --- a need which is well met by maintaining multiple
representations of the world  at various levels of abstraction;
i.e., an entry in the knowledge base has all its generalizations also
stored explicitly, redundantly. 

In addition to aiding us in quickly approximating reality,\foo{
In Section 1, we described a situatin in which a hospital simulation
program might be required upon occasion to provide as good an estimate as
possible to a query within a given (short) time period. }
multiple levels
of abstraction (``MLA") are useful for
{\it analogizing}: they can 
make that process easier, faster, and occasionally even
more valid. Analogies can be recognized when two distinct concepts coincide
at some abstract level.\foo{Conversely, if (corresponding
slots of) two apparently related
concepts (such as two homonyms) do not share related structure at higher
levels of abstraction, that suggests that an analogy between them would be
superficial and lacking in power.  }


Multiple levels of abstraction cost a certain
amount to implement, both initially (man-hours of programming) and as the
program runs (cpu time spent maintaining the redundancy, choosing which level
to work on next, etc.) For problems of all sizes, this feature aids in the
expression of new problems to the program, since they may be input at whatever
level(s) seem appropriate to the user. Thus multiple levels of abstraction
will always add to expressiveness, and will either detract or add to
efficiency depending upon how small or large the task is.
MLA is very useful when the program's resources are
curtailed or, even better, varied dramatically in magnitude from run to run
(a program which was {\it always} limited to a 15-second execution time would
be designed specifically for that task, but a program which might get anywhere
from 5 seconds to 5 hours for the same simulation question could better
benefit from multiple levels of abstraction.)
MLA
contributes more to efficiency
as the quotient (task size)/(resource allotment) increases in size.


There is a more general result here: the methods which humans find useful for
easy expression of problems may be useful to programs attacking very large
problems.  For instance, consider the decision about whether to do inference
by a system of pattern-invoked experts, or rather to preprocess the triggering
conditions into a big discrimination network. The former approach is more
expressive, the latter more efficient --- at least for small problems. For
very large tasks, the need to remain flexible grows very large. It is then
more economical to retain pattern-directed control (the compilation of the
patterns is so complex the whole procedure would have to be redone
frequently during the course of execution, if we opted for that alternative
instead.)  A case in point is the HARPY speech understanding program 
[Lowerre & Reddy 1979]:
whenever the grammar changes --- at any level --- the entire speech recognition
network must be recompiled. HEARSAY II [Lesser and Erman 1977],
while slower in performance, has
sufficient flexibility (from several independent pattern-directed knowledge
sources) to obviate such monolithic integrative recompilation delays.

\vfill\eject
\NNSECP{References}

\parskip 10pt plus 1pt minus 1pt \lineskip 1pt

Adams, J. L.,
{\it Conceptual Blockbusting},
W.H. Freeman: San Francisco, 1974.

Anderson, R. H. and J. J. Gillogly,  ``Rand intelligent terminal agent
(RITA):  design philosophy.''  R-1809-ARPA, The Rand Corporation,
Santa Monica, Calif., 1976.

Balzer, R., N. Goldman, and D. Wile, ``Informality in program
specifications,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 389-397.

Barstow, D., ``A knowledge-based system for automatic program
construction,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 382-388.

Berliner, H. J., ``Chess as problem solving: the developments of a tactics
analyzer,'' Ph.D. Dissertation, Carnegie-Mellon University, Pittsburgh, 1974.

Bobrow, D. G. and B. Raphael, ``New programming languages for
artificial intelligence research,''
{\it Computing Surveys}, {\bf 6},
1974, 153-174.

Bobrow, D. G. and T. Winograd, `` An overview of KRL, a knowledge
representation language,''
{\it Cognitive Science}, {\bf 1},
1977, 3-46.

Brachman, R., ``Theoretical studies in natural language
understanding'', BBN Report 3888, September, 1978.

Burstall, R. M. and J. Darlington, ``A transformation system for
developing recursive programs,''
{\it J. ACM}, {\bf 24},
1977, 44-67.

Cohen, H., ``What is an Image?'', {\it Proc. Sixth International Joint
Conference on Artificial Intelligence,} Tokyo, August, 1979,  1028-51.

Collins, A. M. and M. R. Quillian, ``Retrieval time from semantic memory,''
{\it J. Verbal Learning and Verbal Behavior}, {\bf 8},
1969, 240-247.

Collins, A. M. and M. R. Quillian, ``How to make a language user,'' in
(E. Tulving and D. Donaldson, eds.)
{\it Organization of Memory},
Academic Press: New York, 1972.

Conrad, C.  ``Cognitive economy in semantic memory,''
{\it J. Experimental Psychology}, {\bf 92},
1972, 149-154.

Darlington, J. and R. M. Burstall, ``A system which automatically improves
programs,''
{\it Proc. of the 3th Int. Joint Conf. on Artificial Intelligence},
Stanford, Calif., 1973, 479-485.

Dijkstra, E. W.,
{\it A Discipline of Programming},
Prentice-Hall: Englewood Clifs, N.J., 1976.

Feigenbaum, E. A.  ``The art of artificial intelligence:  themes and
case studies of knowledge engineering,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 1014-1029.

Fikes, R. E., P. E. Hart, and N. J. Nilsson, ``Learning and executing
generalized robot plans,''
{J. Artificial Intelligence}, {\bf 3},
1972, 251-288.

Fiksel, J. R. and G. H. Bower, ``Question answering by a semantic
network of parallel automata,''
{\it J. Math. Psych.}, {\bf 13},
1976, 1-45.

Gelernter, H., ``Realization of a geometry-theorem proving machine,'' in
(E. A. Feigenbaum and Julian Feldman, eds.),
{\it Computers and Thought},
McGraw-Hill: New York, 1963, 134-152.

Green, C., ``A summary of the PSI program synthesis system,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 380-381.

Green, C. and D. Barstow,  ``On program synthesis knowledge,''
{\it J.  Artificial Intelligence}, {\bf 10},
1978, 241-279.

Hayes-Roth, B. and F. Hayes-Roth,  ``Plasticity in memorial networks,''
{\it J. Verbal Learning and Verbal Behavior}, {\bf 14},
1975, 506-522.

Hayes-Roth, B. and F. Hayes-Roth,   ``The prominence of lexical information
in memory representations of meaning,'' {\it J. Verbal Learning
and Verbal Behavior}, {\bf 16}, 1977, 119-136.

Hayes-Roth, B. and C. Walker,   ``Configural effects in human memory,''
{\it Cog\-ni\-tive Sci\-ence},
in press, 1979.

Hayes-Roth, F., P. Klahr, J.  Burge, and D. J. Mostow, ``Machine
methods for acquiring, learning, and applying knowledge,''.  P-6241,
The Rand Corporation, Santa Monica, Calif., 1978.

Hayes-Roth, F. and V. R. Lesser,   ``Focus of attention in the Hearsay-II
speech understanding system,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 27-35.

Kahn, K., ``Making Aesthetic Choices'', {\it Proc. 6IJCAI,}
Tokyo, 1979,  448-50.

Kant, E., ``The selection of efficient implementations for a high level
language,''
{\it Proc. ACM SIGART-SIGPLAN Symp. on Artificial Intelligence and Programming
Languages},
SIGART Newsletter 64, 1977, 140-146.

Klahr, P., ``Planning techniques for rule selection in deductive
ques\-tion-ans\-wer\-ing,'' in (D. A. Waterman and F. Hayes-Roth, eds.),
{\it Pattern-Directed Inference Systems},
Academic Press: New York, 1978.

Knuth, D., {\it Fundamental Algorithms
(The Art of Computer Programming, Volume 1,)} Addison Wesley, 1968.

Kuhn, T., {\it The Structure of Scientific Reovolutions,} 1972.

Lakatos, I., {\it Proofs and Refutations},
Cambridge University Press: Cambridge, 1976.

Lenat, D. B., ``Beings:  knowledge as interacting experts,''
{\it Proc. of the 4th Int. Joint Conf. on Artificial Intelligence},
Tbilisi, USSR, 1975, 126-133.

Lenat, D. B., ``AM: an artificial intelligence approach to discovery in
mathematics as heuristic search,''  Memo AIM-286, Stanford AI Lab, 1976.

Lesser, V. R. and L. D. Erman, ``A retrospective view of the Hearsay-II
architecture,''
{\it Proc. of the 5th Int. Joint Conf. on Artificial Intelligence},
Cambridge, Mass., 1977, 790-800.

Low, J., ``Automatic coding:  choice of data structures,''  Memo AIM-242,
Stanford University, 1974.

Lowerre, B. and D. R. Reddy, ``The Harpy system,'' in (W. Lea, ed.),
{\it Recent Trends in Speech Recognition},
Prentice-Hall: Englewood Cliffs, N. J., in press.

McCarthy, J., ``The advice taker,'' in (M. Minsky, ed.),
{\it Semantic Information Processing,}
MIT Press: Cambridge, Mass., 1968.

Mostow, J. and F. Hayes-Roth,  ``Machine-aided heuristic programming: a
paradigm for knowledge engineering,'' N-1007-NSF, The Rand Corporation,
Santa Monica, Calif., 1979.

Newell, A., J. C. Shaw, and H. A. Simon,  ``Chess-playing programs
and the problem of complexity,'' in (Feigenbaum and Feldman, eds.)
{\it Computers and Thought},
McGraw-Hill: New York, 1963, 39-70.

Newell, A. and H. A. Simon, 
{\it Human Problem Solving},
Prentice-Hall: Englewood Cliffs, N.J., 1972.

Nilsson, N., {\it Problem Solving Methods in Artificial Intelligence}, 1972.

Rich, E., ``Building and Exploiting User Models'', {\it Proc. 6IJCAI,}
Tokyo, 1979,  720.

Rips, L. J., E. J. Shobin, and E. E. Smith,   ``Semantic distance and
the verification of semantic relations,''
{\it J. Verbal Learning and Verbal Be\-ha\-vi\-or}, {\bf 12},
1973, 1-20.

Samuel, A., ``Some Studies on Machine Learnin of Checkers,'' in
(E. A. Feigenbaum and Julian Feldman, eds.),
{\it Computers and Thought},
McGraw-Hill: New York, 1963.

Sandewall, E., private communication, August 22, 1979.

Stefik, M.  ``An examination of a frame-structured representation system,''
Memo HPP-78-13, Stanford University, 1978.

Waterman, D. A., R. H. Anderson, F. Hayes-Roth, P. Klahr, G. R. Martins, and S.
Rosenschein, ``Design of a rule-oriented system for implementing
expertise,''  The Rand Corporation, Santa Monica, 1979.

Waterman, D. A. and F. Hayes-Roth, 
{\it Pattern-Directed Inference Systems},
Academic Press: New York, 1978.


\vfill\end